25/03/2021

Recap

Bindings basics

  • Objects/values do not have names but names have values!
  • Objects have a ‘memory address’/identifiers.
x <- c(1, 2, 3)

Copy-on-modify

  • If we modify values in a vector, actual ‘copying’ is necessary (depending on the data structure of the object…).

Data structures and modify-in-place

Improving performance

  • Bottleneck(s) identified, what now?
  • See previous examples for typical problems in a data analytics context.
  • Vast variety of potential bottlenecks. Hard to give general advice.

Programming with Big Data

  1. Which basic (already implemented) R functions are more or less suitable as building blocks for the program?
  2. How can we exploit/avoid some of R’s lower-level characteristics in order to implement efficient functions?
  3. Is there a need to interface with a lower-level programming language in order to speed up the code? (advanced topic)
  • Independent of how we write a statistical procedure in R (or in any other language, for that matter), is there an alternative statistical procedure/algorithm that is faster but delivers approximately the same result.

Issues to keep in mind

  • Vectorization.
  • Memory: avoid copying, pre-allocate memory.
  • Use built in primitive (C) functions (caution: not always faster, if aim is precision).
  • Existing solutions: load additional packages (read.csv() vs. data.table::fread()).
    • Focus of what follows in this course (approach taken in Walkowiak (2016)).

Procedural view and further reading

Goals for today

  1. Know basic strategies for out-of-memory operations in R.
  2. Know basic tools for local big data cleaning and transformation in R.
  3. Understand (in simple terms) how these tools work.
  4. (Recap of virtual memory concept)

Virtual Memory

Virtual memory

  • Operating system allocates part of mass storage device (hard-disk) as virtual memory.
  • Process/application uses up too much RAM, OS starts swapping data between RAM and virtual memory.
  • Processes slow down due to swapping.
  • Default (OS) usage of virtual memory concept is not necessarily optimized for data analysis tasks.

Virtual memory

Virtual memory: example (linux)

‘Out-of-memory’ strategies

  • Use virtual memory idea for specific data analytics tasks.
  • Two approaches:
    • Chunked data files on disk: partition large data set, map and store chunks of raw data on disk. Keep mapping in RAM. (ff-package)
    • Memory mapped files and shared memory: virtual memory is explicitly allocated for one or several specific data analytics tasks (different processes can access the same memory segment). (bigmemory-package)

Chunking data with the ff-package

Preparations

# SET UP --------------

# install.packages(c("ff", "ffbase"))
# load packages
library(ff)
library(ffbase)
library(pryr)

# create directory for ff chunks, and assign directory to ff 
system("mkdir ffdf")
options(fftempdir = "ffdf")

Chunking data with the ff-package

Import data, inspect change in RAM.

##             used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells   1393046  74.4    2150848  114.9   2150848  114.9
## Vcells 122509230 934.7  213343342 1627.7 211037840 1610.1
mem_change(
flights <- 
     read.table.ffdf(file="../data/flights.csv",
                     sep=",",
                     VERBOSE=TRUE,
                     header=TRUE,
                     next.rows=100000,
                     colClasses=NA)
)
## read.table.ffdf 1..100000 (100000)  csv-read=0.458sec ffdf-write=0.075sec
## read.table.ffdf 100001..200000 (100000)  csv-read=0.508sec ffdf-write=0.048sec
## read.table.ffdf 200001..300000 (100000)  csv-read=0.473sec ffdf-write=0.036sec
## read.table.ffdf 300001..336776 (36776)  csv-read=0.184sec ffdf-write=0.019sec
##  csv-read=1.623sec  ffdf-write=0.178sec  TOTAL=1.801sec
## -31.6 MB

Chunking data with the ff-package

Inspect file chunks on disk and data structure in R environment.

# show the files in the directory keeping the chunks
list.files("ffdf")
##    [1] "clone1664b7fbd953f.ff" "clone1664b9b8cca9.ff"  "clone1e7014c0a1cd8.ff"
##    [4] "clone1e7015a4f712e.ff" "clone2aea22211d9e1.ff" "clone2aea2360c6703.ff"
##    [7] "clone2aea2566ab42d.ff" "clone2aea25e1c1f75.ff" "clone2d49618dbfbf6.ff"
##   [10] "clone2d4965ee3349a.ff" "clone2d49664b07745.ff" "clone2d49672b82b88.ff"
##   [13] "clone308112a4ca401.ff" "clone308113d044b7c.ff" "clone308113d22fb5f.ff"
##   [16] "clone3081149714ed4.ff" "clone399cd5627eb1f.ff" "clone399cd72c6506d.ff"
##   [19] "clone399cd78f6c4e6.ff" "clone399cd8b1f075.ff"  "clone3c3ef1e38eca1.ff"
##   [22] "clone3c3ef4ac46441.ff" "clone3c3ef514956e9.ff" "clone3c3efcb5fb24.ff" 
##   [25] "clone3f8e9146bba78.ff" "clone3f8e94d9633f0.ff" "clone3f8e9506fcdf1.ff"
##   [28] "clone3f8e9b9a7b8.ff"   "clone432452f2dbbc3.ff" "clone4324578aefe51.ff"
##   [31] "clone4324579d1ad52.ff" "clone432457a4743f9.ff" "clone47e962c215aa7.ff"
##   [34] "clone47e9634b394d6.ff" "clone47e96595b1fd7.ff" "clone47e96a5d61e5.ff" 
##   [37] "clone605a84106bec0.ff" "clone605a86e293a8f.ff" "clone605a86fc8f5a2.ff"
##   [40] "clone605a8dfeeebd.ff"  "clonee6e02b3603f7.ff"  "clonee6e065135290.ff" 
##   [43] "clonee6e0e80612d.ff"   "clonee6e0f7dd64d.ff"   "ff1664b222c38f0.ff"   
##   [46] "ff1664b4d23ee78.ff"    "ff1664b4d7f1e3e.ff"    "ff1e7011754e092.ff"   
##   [49] "ff1e7011a76d5a6.ff"    "ff1e7017084631e.ff"    "ff2aea22c3703b9.ff"   
##   [52] "ff2aea2664ee33.ff"     "ff2aea26b164ce7.ff"    "ff2d49627cd458.ff"    
##   [55] "ff2d49631ca5a34.ff"    "ff2d4964237cc21.ff"    "ff30811207897c4.ff"   
##   [58] "ff308115699c1a.ff"     "ff30811b3430e.ff"      "ff399cd1a2ffc0e.ff"   
##   [61] "ff399cd1e963877.ff"    "ff399cd5477c29.ff"     "ff3c3ef17300293.ff"   
##   [64] "ff3c3ef2229a09b.ff"    "ff3c3ef765d6fb7.ff"    "ff3f8e9435a0018.ff"   
##   [67] "ff3f8e9615e63d.ff"     "ff3f8e96a58ed50.ff"    "ff432453e724ac7.ff"   
##   [70] "ff432454dd9f249.ff"    "ff432455fb42793.ff"    "ff47e96206a3922.ff"   
##   [73] "ff47e964910d4dc.ff"    "ff47e9673ee2241.ff"    "ff605a818744acd.ff"   
##   [76] "ff605a85187e8a3.ff"    "ff605a866ac333b.ff"    "ffdf1664b11f63957.ff" 
##   [79] "ffdf1664b12d379a7.ff"  "ffdf1664b16a5a516.ff"  "ffdf1664b16cc8da8.ff" 
##   [82] "ffdf1664b16d72904.ff"  "ffdf1664b178a08cd.ff"  "ffdf1664b17c18654.ff" 
##   [85] "ffdf1664b23c24fe0.ff"  "ffdf1664b24c84c7c.ff"  "ffdf1664b25a0763c.ff" 
##   [88] "ffdf1664b2663b0d0.ff"  "ffdf1664b283091fe.ff"  "ffdf1664b2b44b5a4.ff" 
##   [91] "ffdf1664b2c262d7d.ff"  "ffdf1664b2cd1a62.ff"   "ffdf1664b2e3a4c4e.ff" 
##   [94] "ffdf1664b2ed055dd.ff"  "ffdf1664b3054a5d7.ff"  "ffdf1664b3078f7ce.ff" 
##   [97] "ffdf1664b310e3e80.ff"  "ffdf1664b334c94f4.ff"  "ffdf1664b34d11719.ff" 
##  [100] "ffdf1664b388c4b44.ff"  "ffdf1664b39224dcd.ff"  "ffdf1664b39a5a878.ff" 
##  [103] "ffdf1664b39c6c3b6.ff"  "ffdf1664b3af201e8.ff"  "ffdf1664b3b734f1b.ff" 
##  [106] "ffdf1664b3c340dc7.ff"  "ffdf1664b3da147d5.ff"  "ffdf1664b3dd310eb.ff" 
##  [109] "ffdf1664b3ea58fa8.ff"  "ffdf1664b40a69eb6.ff"  "ffdf1664b40aa6c4e.ff" 
##  [112] "ffdf1664b438a096c.ff"  "ffdf1664b44f516e4.ff"  "ffdf1664b4805a867.ff" 
##  [115] "ffdf1664b49ebec96.ff"  "ffdf1664b4a28a649.ff"  "ffdf1664b4ae45ee8.ff" 
##  [118] "ffdf1664b4d469edf.ff"  "ffdf1664b4f073478.ff"  "ffdf1664b4f225206.ff" 
##  [121] "ffdf1664b504b5000.ff"  "ffdf1664b531499e6.ff"  "ffdf1664b542fe096.ff" 
##  [124] "ffdf1664b549b05d6.ff"  "ffdf1664b5795f4c9.ff"  "ffdf1664b58ba26cb.ff" 
##  [127] "ffdf1664b5b588fab.ff"  "ffdf1664b5bba6e30.ff"  "ffdf1664b5be03fcf.ff" 
##  [130] "ffdf1664b5c6b41d2.ff"  "ffdf1664b5dbcaae3.ff"  "ffdf1664b6443be89.ff" 
##  [133] "ffdf1664b64afc395.ff"  "ffdf1664b656e654e.ff"  "ffdf1664b65e821c7.ff" 
##  [136] "ffdf1664b68051f3.ff"   "ffdf1664b682d3c68.ff"  "ffdf1664b68dc5ac9.ff" 
##  [139] "ffdf1664b69265251.ff"  "ffdf1664b6c5d23e1.ff"  "ffdf1664b70b69393.ff" 
##  [142] "ffdf1664b70dab41.ff"   "ffdf1664b749659de.ff"  "ffdf1664b76055d56.ff" 
##  [145] "ffdf1664b7679060b.ff"  "ffdf1664b793471bd.ff"  "ffdf1664b799c228f.ff" 
##  [148] "ffdf1664b7a45795.ff"   "ffdf1664b7a9c6cf4.ff"  "ffdf1664b7c83bd2f.ff" 
##  [151] "ffdf1664b7f9ba27b.ff"  "ffdf1664bb4bf99c.ff"   "ffdf1664bc7e093a.ff"  
##  [154] "ffdf1664bd3e0371.ff"   "ffdf1664bfac865c.ff"   "ffdf1ace210b3793b.ff" 
##  [157] "ffdf1ace2152aa26c.ff"  "ffdf1ace21690c3a4.ff"  "ffdf1ace21cf46f6a.ff" 
##  [160] "ffdf1ace22037e706.ff"  "ffdf1ace227c71e57.ff"  "ffdf1ace22a1bec13.ff" 
##  [163] "ffdf1ace22d643bed.ff"  "ffdf1ace249c4aee4.ff"  "ffdf1ace25ea84ebf.ff" 
##  [166] "ffdf1ace25f0ff85a.ff"  "ffdf1ace260487a3d.ff"  "ffdf1ace260f82c73.ff" 
##  [169] "ffdf1ace264287025.ff"  "ffdf1ace2713ff6e3.ff"  "ffdf1ace296fe013.ff"  
##  [172] "ffdf1ace29d0dcd.ff"    "ffdf1ace2bb0a2cc.ff"   "ffdf1ace2e4274e3.ff"  
##  [175] "ffdf1c45014d3a58b.ff"  "ffdf1c45016fface9.ff"  "ffdf1c4501c1e8f51.ff" 
##  [178] "ffdf1c450283e6c31.ff"  "ffdf1c450332f29d8.ff"  "ffdf1c45036686e35.ff" 
##  [181] "ffdf1c4503a24d5d6.ff"  "ffdf1c4503d4fde96.ff"  "ffdf1c45040612509.ff" 
##  [184] "ffdf1c4504f84eb98.ff"  "ffdf1c4505346df0e.ff"  "ffdf1c450543a53b5.ff" 
##  [187] "ffdf1c4506ff73ad0.ff"  "ffdf1c4507014e9ab.ff"  "ffdf1c45071639c64.ff" 
##  [190] "ffdf1c4507932b10a.ff"  "ffdf1c4507a88a181.ff"  "ffdf1c4507d58d596.ff" 
##  [193] "ffdf1c450f24806.ff"    "ffdf1e70111ed2bfd.ff"  "ffdf1e701134419ba.ff" 
##  [196] "ffdf1e70113a93127.ff"  "ffdf1e7011454351.ff"   "ffdf1e701149cfd8f.ff" 
##  [199] "ffdf1e701163ec7ca.ff"  "ffdf1e70118df6828.ff"  "ffdf1e70119d9c8f2.ff" 
##  [202] "ffdf1e7011ab2cb13.ff"  "ffdf1e7011c6f1ef2.ff"  "ffdf1e7011ff3b5ce.ff" 
##  [205] "ffdf1e7012080598c.ff"  "ffdf1e701216aec13.ff"  "ffdf1e70121c294e4.ff" 
##  [208] "ffdf1e70123667b.ff"    "ffdf1e701242bc3aa.ff"  "ffdf1e701245cdace.ff" 
##  [211] "ffdf1e70124ff7829.ff"  "ffdf1e7012540c78a.ff"  "ffdf1e701255f11f.ff"  
##  [214] "ffdf1e70126d0c7cf.ff"  "ffdf1e70127b1bd84.ff"  "ffdf1e7012ac93ad1.ff" 
##  [217] "ffdf1e7012b2455b.ff"   "ffdf1e7012b78f8e4.ff"  "ffdf1e7012d2036a1.ff" 
##  [220] "ffdf1e70130e688f5.ff"  "ffdf1e7013133e1b9.ff"  "ffdf1e701321f514.ff"  
##  [223] "ffdf1e70133e5f9e0.ff"  "ffdf1e70134c315e7.ff"  "ffdf1e70137a512e6.ff" 
##  [226] "ffdf1e70138ea5638.ff"  "ffdf1e701391bbb2d.ff"  "ffdf1e7013923bfae.ff" 
##  [229] "ffdf1e7013a1a516b.ff"  "ffdf1e7013a79f8f7.ff"  "ffdf1e7013b75b97a.ff" 
##  [232] "ffdf1e7013b9685d.ff"   "ffdf1e7013cf0d95f.ff"  "ffdf1e7013d24a714.ff" 
##  [235] "ffdf1e7013dcced9.ff"   "ffdf1e70141dfe87b.ff"  "ffdf1e701441f7978.ff" 
##  [238] "ffdf1e7014450909c.ff"  "ffdf1e701459ec99b.ff"  "ffdf1e70145c0001a.ff" 
##  [241] "ffdf1e70146939527.ff"  "ffdf1e70146b76ea5.ff"  "ffdf1e70147035c6e.ff" 
##  [244] "ffdf1e701475ec4b9.ff"  "ffdf1e701489c06a5.ff"  "ffdf1e70149ec423d.ff" 
##  [247] "ffdf1e7014a6bfd58.ff"  "ffdf1e7014adee59d.ff"  "ffdf1e7014cd3d712.ff" 
##  [250] "ffdf1e7014cf73d8d.ff"  "ffdf1e7014d159c00.ff"  "ffdf1e7014def2f74.ff" 
##  [253] "ffdf1e7014e839a34.ff"  "ffdf1e70150d40c66.ff"  "ffdf1e701524f51d4.ff" 
##  [256] "ffdf1e70153e79795.ff"  "ffdf1e701547987a6.ff"  "ffdf1e70157dde213.ff" 
##  [259] "ffdf1e7015b4a2185.ff"  "ffdf1e7015c57a1a9.ff"  "ffdf1e7015f4e3692.ff" 
##  [262] "ffdf1e70160b3e23c.ff"  "ffdf1e7016123f92.ff"   "ffdf1e70166d10680.ff" 
##  [265] "ffdf1e701670e9b46.ff"  "ffdf1e70167d4beb6.ff"  "ffdf1e701684bd07d.ff" 
##  [268] "ffdf1e701692f8ef.ff"   "ffdf1e70169cb0e10.ff"  "ffdf1e7016b231d89.ff" 
##  [271] "ffdf1e7017025d87a.ff"  "ffdf1e70172061165.ff"  "ffdf1e701720dc7a.ff"  
##  [274] "ffdf1e70173bbda5c.ff"  "ffdf1e701750bc1b6.ff"  "ffdf1e70175cd4641.ff" 
##  [277] "ffdf1e701799b79a.ff"   "ffdf1e7017b68a809.ff"  "ffdf1e7017d502e6e.ff" 
##  [280] "ffdf1e7017e3e246d.ff"  "ffdf1e7017f9038b8.ff"  "ffdf1e7018551842.ff"  
##  [283] "ffdf1e7018700a14.ff"   "ffdf1e7018c39a1a.ff"   "ffdf1e7019003a51.ff"  
##  [286] "ffdf1e7019b4ee03.ff"   "ffdf1e701b460723.ff"   "ffdf1e701e16335e.ff"  
##  [289] "ffdf1e701f9706f9.ff"   "ffdf1e701ffd8e01.ff"   "ffdf2aea211a9d2f3.ff" 
##  [292] "ffdf2aea2125f9beb.ff"  "ffdf2aea212c50ad8.ff"  "ffdf2aea21373d31.ff"  
##  [295] "ffdf2aea21396d91c.ff"  "ffdf2aea215449e20.ff"  "ffdf2aea2159afc27.ff" 
##  [298] "ffdf2aea2197d5dd6.ff"  "ffdf2aea21a497ec5.ff"  "ffdf2aea21b4925e6.ff" 
##  [301] "ffdf2aea21bacebae.ff"  "ffdf2aea21be5d36.ff"   "ffdf2aea21c6d8ece.ff" 
##  [304] "ffdf2aea21dd0b799.ff"  "ffdf2aea22005aa3.ff"   "ffdf2aea2212708e2.ff" 
##  [307] "ffdf2aea221413bc8.ff"  "ffdf2aea221d20435.ff"  "ffdf2aea22271ba5f.ff" 
##  [310] "ffdf2aea22444718e.ff"  "ffdf2aea227aebdf1.ff"  "ffdf2aea227c003a0.ff" 
##  [313] "ffdf2aea2289f4e92.ff"  "ffdf2aea229991fa4.ff"  "ffdf2aea22df3ec18.ff" 
##  [316] "ffdf2aea2304f7e9d.ff"  "ffdf2aea230e4220d.ff"  "ffdf2aea233f9fcd5.ff" 
##  [319] "ffdf2aea23458da4b.ff"  "ffdf2aea2363d3988.ff"  "ffdf2aea2364b497f.ff" 
##  [322] "ffdf2aea237557cde.ff"  "ffdf2aea237d9f5d.ff"   "ffdf2aea238310f44.ff" 
##  [325] "ffdf2aea2393123c2.ff"  "ffdf2aea23b7087a8.ff"  "ffdf2aea23d70e24b.ff" 
##  [328] "ffdf2aea23ecbaf8e.ff"  "ffdf2aea23edf492e.ff"  "ffdf2aea23fae902.ff"  
##  [331] "ffdf2aea241d9c01.ff"   "ffdf2aea244314c61.ff"  "ffdf2aea244597972.ff" 
##  [334] "ffdf2aea24670062c.ff"  "ffdf2aea24733841d.ff"  "ffdf2aea24790d5f2.ff" 
##  [337] "ffdf2aea24b0b1425.ff"  "ffdf2aea24b1cf07a.ff"  "ffdf2aea24bae71f3.ff" 
##  [340] "ffdf2aea24f1ab53e.ff"  "ffdf2aea250d919c6.ff"  "ffdf2aea2529776fd.ff" 
##  [343] "ffdf2aea252aa7d3.ff"   "ffdf2aea252ae8199.ff"  "ffdf2aea254f9c0ef.ff" 
##  [346] "ffdf2aea25523e464.ff"  "ffdf2aea2553ce582.ff"  "ffdf2aea256a96a9b.ff" 
##  [349] "ffdf2aea257dc3698.ff"  "ffdf2aea25cf2cba3.ff"  "ffdf2aea25da0b7d4.ff" 
##  [352] "ffdf2aea25dc25075.ff"  "ffdf2aea2600ceb56.ff"  "ffdf2aea26199bed3.ff" 
##  [355] "ffdf2aea266305546.ff"  "ffdf2aea266cd72ff.ff"  "ffdf2aea268cf74c3.ff" 
##  [358] "ffdf2aea26ceef4af.ff"  "ffdf2aea26dff0410.ff"  "ffdf2aea26f4f85b3.ff" 
##  [361] "ffdf2aea27042e6d5.ff"  "ffdf2aea27086c2e4.ff"  "ffdf2aea273892246.ff" 
##  [364] "ffdf2aea27516f90d.ff"  "ffdf2aea27818cedf.ff"  "ffdf2aea27c3096a1.ff" 
##  [367] "ffdf2aea2bfa0b8b.ff"   "ffdf2aea2e7c30f0.ff"   "ffdf2d4961013983d.ff" 
##  [370] "ffdf2d496106d71ed.ff"  "ffdf2d49610dcc011.ff"  "ffdf2d49611582295.ff" 
##  [373] "ffdf2d49612fd0385.ff"  "ffdf2d49613d4f6ed.ff"  "ffdf2d49613d96f9e.ff" 
##  [376] "ffdf2d496169a9f59.ff"  "ffdf2d49616c18900.ff"  "ffdf2d49616c34633.ff" 
##  [379] "ffdf2d496183560e5.ff"  "ffdf2d49619378a62.ff"  "ffdf2d4961a902ea8.ff" 
##  [382] "ffdf2d4961c9263f0.ff"  "ffdf2d4961ca2ccc2.ff"  "ffdf2d4961d62507b.ff" 
##  [385] "ffdf2d4961e2276d7.ff"  "ffdf2d4961f34f0f1.ff"  "ffdf2d496203a0c22.ff" 
##  [388] "ffdf2d49620967a59.ff"  "ffdf2d49620b5ab8d.ff"  "ffdf2d49621c4e92e.ff" 
##  [391] "ffdf2d4962278efad.ff"  "ffdf2d49623c14951.ff"  "ffdf2d4962761eb91.ff" 
##  [394] "ffdf2d49627f5aeee.ff"  "ffdf2d496283b8a90.ff"  "ffdf2d4962a0dcff6.ff" 
##  [397] "ffdf2d4962d54e8ec.ff"  "ffdf2d4962db10d9.ff"   "ffdf2d4962db30dc9.ff" 
##  [400] "ffdf2d4962edfebd1.ff"  "ffdf2d4962ef6e9e5.ff"  "ffdf2d4962fe11818.ff" 
##  [403] "ffdf2d496321781a7.ff"  "ffdf2d49636444aec.ff"  "ffdf2d4963657f6ef.ff" 
##  [406] "ffdf2d4963841b710.ff"  "ffdf2d49638697848.ff"  "ffdf2d49638ef0169.ff" 
##  [409] "ffdf2d4963b84c621.ff"  "ffdf2d4963cab4dbb.ff"  "ffdf2d4963d316f51.ff" 
##  [412] "ffdf2d4963d352298.ff"  "ffdf2d4963f2376c9.ff"  "ffdf2d496406db597.ff" 
##  [415] "ffdf2d496408484c8.ff"  "ffdf2d496412d3950.ff"  "ffdf2d496414f5bd2.ff" 
##  [418] "ffdf2d4964267fd5d.ff"  "ffdf2d49642f80dd.ff"   "ffdf2d496434537a9.ff" 
##  [421] "ffdf2d496479bf7b3.ff"  "ffdf2d4964954fa74.ff"  "ffdf2d4964a1ead2.ff"  
##  [424] "ffdf2d4964c0cbd8d.ff"  "ffdf2d4964c7f709.ff"   "ffdf2d4964ea6637a.ff" 
##  [427] "ffdf2d4964f8d643f.ff"  "ffdf2d4964f98cd4a.ff"  "ffdf2d49653c84e28.ff" 
##  [430] "ffdf2d49654476e8d.ff"  "ffdf2d49655f252b9.ff"  "ffdf2d49656a8e37e.ff" 
##  [433] "ffdf2d496576bb953.ff"  "ffdf2d4965770d8d7.ff"  "ffdf2d496577e7084.ff" 
##  [436] "ffdf2d4965923f538.ff"  "ffdf2d4965b15d937.ff"  "ffdf2d4965c666042.ff" 
##  [439] "ffdf2d4965d108259.ff"  "ffdf2d4965d77957b.ff"  "ffdf2d49661628a1c.ff" 
##  [442] "ffdf2d4966271d843.ff"  "ffdf2d49663851f6d.ff"  "ffdf2d49664cbe6c5.ff" 
##  [445] "ffdf2d496654102d3.ff"  "ffdf2d4966608f1e5.ff"  "ffdf2d4966756073.ff"  
##  [448] "ffdf2d496681c13ac.ff"  "ffdf2d49669105fe2.ff"  "ffdf2d4966a82531c.ff" 
##  [451] "ffdf2d4966e81d93e.ff"  "ffdf2d4966f3bba34.ff"  "ffdf2d4966f7ce981.ff" 
##  [454] "ffdf2d4967323c411.ff"  "ffdf2d49674325791.ff"  "ffdf2d4967449a4e4.ff" 
##  [457] "ffdf2d496758840d9.ff"  "ffdf2d49679b3795a.ff"  "ffdf2d4967e0556de.ff" 
##  [460] "ffdf2d4967f407c47.ff"  "ffdf2d496b894f8f.ff"   "ffdf2d496c7ff2bf.ff"  
##  [463] "ffdf2d496d1db0f.ff"    "ffdf2d496d97ad.ff"     "ffdf2d496fcc9a35.ff"  
##  [466] "ffdf30811108aca03.ff"  "ffdf3081110a2897a.ff"  "ffdf30811132d8914.ff" 
##  [469] "ffdf3081113cc82fa.ff"  "ffdf3081115ee8d85.ff"  "ffdf30811160d41ca.ff" 
##  [472] "ffdf30811186296fd.ff"  "ffdf308111a495892.ff"  "ffdf308111adc7fb.ff"  
##  [475] "ffdf308111bbe474.ff"   "ffdf308111bfe70d2.ff"  "ffdf308111c40313.ff"  
##  [478] "ffdf308111e0ac89e.ff"  "ffdf308111f654ff4.ff"  "ffdf30811203f7ca6.ff" 
##  [481] "ffdf308112088d10f.ff"  "ffdf30811208e1bbb.ff"  "ffdf3081120bfd9a3.ff" 
##  [484] "ffdf30811212d537d.ff"  "ffdf3081121c035c5.ff"  "ffdf3081122b530ec.ff" 
##  [487] "ffdf3081122ff9dba.ff"  "ffdf3081124a262c6.ff"  "ffdf30811260536ef.ff" 
##  [490] "ffdf30811270c4434.ff"  "ffdf30811283531d.ff"   "ffdf308112a51272d.ff" 
##  [493] "ffdf308112ab5913c.ff"  "ffdf308112b71239b.ff"  "ffdf308112e01d29d.ff" 
##  [496] "ffdf308112f3c6ac8.ff"  "ffdf308113036acd8.ff"  "ffdf3081135e2ba00.ff" 
##  [499] "ffdf3081135f99c0e.ff"  "ffdf30811372c1d8d.ff"  "ffdf308113764ec6e.ff" 
##  [502] "ffdf308113770abf6.ff"  "ffdf3081138e8b18e.ff"  "ffdf308113a3663.ff"   
##  [505] "ffdf308113a7607bb.ff"  "ffdf308113ac1f056.ff"  "ffdf308113c06487.ff"  
##  [508] "ffdf308113c7a3610.ff"  "ffdf308113e24ad23.ff"  "ffdf3081142901c05.ff" 
##  [511] "ffdf3081142a88c8.ff"   "ffdf308114335eae6.ff"  "ffdf308114407b2ba.ff" 
##  [514] "ffdf3081146f8991c.ff"  "ffdf308114a668070.ff"  "ffdf308114cec2bd5.ff" 
##  [517] "ffdf3081150336648.ff"  "ffdf3081150b70389.ff"  "ffdf3081153203ca4.ff" 
##  [520] "ffdf308115351cafa.ff"  "ffdf308115646e279.ff"  "ffdf3081157122f82.ff" 
##  [523] "ffdf3081157819b0c.ff"  "ffdf3081157fc67e.ff"   "ffdf308115a6d4a42.ff" 
##  [526] "ffdf308115b71b912.ff"  "ffdf308115d05f56.ff"   "ffdf308115d51fa62.ff" 
##  [529] "ffdf308115fe5333c.ff"  "ffdf30811603aab49.ff"  "ffdf3081160db552c.ff" 
##  [532] "ffdf30811649083c9.ff"  "ffdf3081167ad915e.ff"  "ffdf3081168393266.ff" 
##  [535] "ffdf308116960c46a.ff"  "ffdf30811699bd8a.ff"   "ffdf308116a5b4957.ff" 
##  [538] "ffdf308116c091152.ff"  "ffdf308116cdaf1ff.ff"  "ffdf308116eb2f73.ff"  
##  [541] "ffdf308116fff319b.ff"  "ffdf30811731eac78.ff"  "ffdf30811735f5a1a.ff" 
##  [544] "ffdf30811736c97be.ff"  "ffdf3081173a6ce21.ff"  "ffdf30811756d6ceb.ff" 
##  [547] "ffdf30811757f260f.ff"  "ffdf308117649bf11.ff"  "ffdf3081177a2532b.ff" 
##  [550] "ffdf308117807183f.ff"  "ffdf308117879cdd8.ff"  "ffdf308117a32190.ff"  
##  [553] "ffdf308117ad11d55.ff"  "ffdf308117cb8c58d.ff"  "ffdf308117dbf4130.ff" 
##  [556] "ffdf308117f75ea93.ff"  "ffdf308117f828399.ff"  "ffdf308119f10c79.ff"  
##  [559] "ffdf30811d1a6ea5.ff"   "ffdf30811dc94717.ff"   "ffdf30811f5dcaec.ff"  
##  [562] "ffdf30811fa84ed.ff"    "ffdf32d6fb1077e9a7.ff" "ffdf32d6fb12668a53.ff"
##  [565] "ffdf32d6fb1c099ce6.ff" "ffdf32d6fb232e04c5.ff" "ffdf32d6fb38961624.ff"
##  [568] "ffdf32d6fb3b3fea59.ff" "ffdf32d6fb3d058d34.ff" "ffdf32d6fb3e37a088.ff"
##  [571] "ffdf32d6fb3f12402f.ff" "ffdf32d6fb4f98984.ff"  "ffdf32d6fb4f9a19c3.ff"
##  [574] "ffdf32d6fb50076313.ff" "ffdf32d6fb52c743ca.ff" "ffdf32d6fb52fd9b5c.ff"
##  [577] "ffdf32d6fb55647d4f.ff" "ffdf32d6fb591ce210.ff" "ffdf32d6fb5f23ddde.ff"
##  [580] "ffdf32d6fb627e13fa.ff" "ffdf32d6fb675729e0.ff" "ffdf399cd1366b246.ff" 
##  [583] "ffdf399cd14418c29.ff"  "ffdf399cd18c9d73f.ff"  "ffdf399cd18f6fcc3.ff" 
##  [586] "ffdf399cd1a33942.ff"   "ffdf399cd1af253b3.ff"  "ffdf399cd1b2b9eb4.ff" 
##  [589] "ffdf399cd1badd7a.ff"   "ffdf399cd1c4aa49d.ff"  "ffdf399cd1c59c49b.ff" 
##  [592] "ffdf399cd20320730.ff"  "ffdf399cd204c5f1f.ff"  "ffdf399cd2131e5d9.ff" 
##  [595] "ffdf399cd23c7da78.ff"  "ffdf399cd251ecfd4.ff"  "ffdf399cd27172371.ff" 
##  [598] "ffdf399cd27a83e6f.ff"  "ffdf399cd28a54d72.ff"  "ffdf399cd29091ee3.ff" 
##  [601] "ffdf399cd29481fdf.ff"  "ffdf399cd294b77b1.ff"  "ffdf399cd2a039f63.ff" 
##  [604] "ffdf399cd2a4e582a.ff"  "ffdf399cd2be2be8c.ff"  "ffdf399cd2cf7a043.ff" 
##  [607] "ffdf399cd2de374dd.ff"  "ffdf399cd2e40c7eb.ff"  "ffdf399cd2eb438f5.ff" 
##  [610] "ffdf399cd2f15fcc1.ff"  "ffdf399cd2f37ff6a.ff"  "ffdf399cd31044811.ff" 
##  [613] "ffdf399cd314684e2.ff"  "ffdf399cd367a904b.ff"  "ffdf399cd37d6535a.ff" 
##  [616] "ffdf399cd39455ff6.ff"  "ffdf399cd39a4a3eb.ff"  "ffdf399cd3b904810.ff" 
##  [619] "ffdf399cd3cd6b830.ff"  "ffdf399cd3eec2014.ff"  "ffdf399cd3f07bdd3.ff" 
##  [622] "ffdf399cd3f501710.ff"  "ffdf399cd40beeebc.ff"  "ffdf399cd42d4a82b.ff" 
##  [625] "ffdf399cd44198b9c.ff"  "ffdf399cd444e2b61.ff"  "ffdf399cd447e5438.ff" 
##  [628] "ffdf399cd46a34118.ff"  "ffdf399cd46bf5311.ff"  "ffdf399cd4c11e588.ff" 
##  [631] "ffdf399cd4cf5144c.ff"  "ffdf399cd4da998c5.ff"  "ffdf399cd4e8d23d0.ff" 
##  [634] "ffdf399cd4f36c4a3.ff"  "ffdf399cd4f91aef8.ff"  "ffdf399cd5052cfa1.ff" 
##  [637] "ffdf399cd516ce3fe.ff"  "ffdf399cd543017f6.ff"  "ffdf399cd5887665b.ff" 
##  [640] "ffdf399cd589f7fd5.ff"  "ffdf399cd5bc24f40.ff"  "ffdf399cd5c32d383.ff" 
##  [643] "ffdf399cd5c9a4e6b.ff"  "ffdf399cd5e3e2525.ff"  "ffdf399cd5e635182.ff" 
##  [646] "ffdf399cd62a4dec7.ff"  "ffdf399cd6307479.ff"   "ffdf399cd63b04978.ff" 
##  [649] "ffdf399cd63f6fc.ff"    "ffdf399cd679a3c02.ff"  "ffdf399cd67aaf6df.ff" 
##  [652] "ffdf399cd68531ea.ff"   "ffdf399cd6b17801b.ff"  "ffdf399cd6de42291.ff" 
##  [655] "ffdf399cd6eeb06eb.ff"  "ffdf399cd70fa07c6.ff"  "ffdf399cd71e74984.ff" 
##  [658] "ffdf399cd735dfbd2.ff"  "ffdf399cd7487dca4.ff"  "ffdf399cd74c0bc36.ff" 
##  [661] "ffdf399cd76c6823f.ff"  "ffdf399cd772a793b.ff"  "ffdf399cd7759d0f8.ff" 
##  [664] "ffdf399cd77ebbac1.ff"  "ffdf399cd787d7820.ff"  "ffdf399cd7969c3d9.ff" 
##  [667] "ffdf399cd7ae083b2.ff"  "ffdf399cd7b1e18cb.ff"  "ffdf399cd7dbe4fa9.ff" 
##  [670] "ffdf399cd7e33b759.ff"  "ffdf399cd94bda23.ff"   "ffdf399cdbe12a52.ff"  
##  [673] "ffdf399cdc27d311.ff"   "ffdf399cdc5848a5.ff"   "ffdf399cdda56e6.ff"   
##  [676] "ffdf399cde227f3b.ff"   "ffdf399cdf3105e8.ff"   "ffdf399cdf8c509f.ff"  
##  [679] "ffdf3c3ef1075cd1a.ff"  "ffdf3c3ef136180ac.ff"  "ffdf3c3ef1588a8c8.ff" 
##  [682] "ffdf3c3ef16f898bc.ff"  "ffdf3c3ef18449e8.ff"   "ffdf3c3ef188c5f39.ff" 
##  [685] "ffdf3c3ef1e0f0c60.ff"  "ffdf3c3ef1f79e2ca.ff"  "ffdf3c3ef208e0d.ff"   
##  [688] "ffdf3c3ef22a0b6b7.ff"  "ffdf3c3ef26980416.ff"  "ffdf3c3ef269bb5de.ff" 
##  [691] "ffdf3c3ef26e416f1.ff"  "ffdf3c3ef28760c8.ff"   "ffdf3c3ef2b1522c.ff"  
##  [694] "ffdf3c3ef2b5525b2.ff"  "ffdf3c3ef2bcc30e4.ff"  "ffdf3c3ef2c3d18ea.ff" 
##  [697] "ffdf3c3ef2cd38bf8.ff"  "ffdf3c3ef2d10ad28.ff"  "ffdf3c3ef2d635931.ff" 
##  [700] "ffdf3c3ef2dd4c753.ff"  "ffdf3c3ef2f15d6b0.ff"  "ffdf3c3ef30bea03a.ff" 
##  [703] "ffdf3c3ef330688e1.ff"  "ffdf3c3ef33aba7a4.ff"  "ffdf3c3ef3550edb1.ff" 
##  [706] "ffdf3c3ef3596f210.ff"  "ffdf3c3ef35cdc52.ff"   "ffdf3c3ef36c9b940.ff" 
##  [709] "ffdf3c3ef37e172d6.ff"  "ffdf3c3ef3b7b2f2b.ff"  "ffdf3c3ef3ccb5ab4.ff" 
##  [712] "ffdf3c3ef3d1249c2.ff"  "ffdf3c3ef3d7d4b12.ff"  "ffdf3c3ef3e299f31.ff" 
##  [715] "ffdf3c3ef3eac0e2e.ff"  "ffdf3c3ef3f6c3fb1.ff"  "ffdf3c3ef4012f7a3.ff" 
##  [718] "ffdf3c3ef4306d392.ff"  "ffdf3c3ef44935bc4.ff"  "ffdf3c3ef46091abd.ff" 
##  [721] "ffdf3c3ef4963a438.ff"  "ffdf3c3ef4a38a6fd.ff"  "ffdf3c3ef4e8fb035.ff" 
##  [724] "ffdf3c3ef4eedf47a.ff"  "ffdf3c3ef52240272.ff"  "ffdf3c3ef53d5d916.ff" 
##  [727] "ffdf3c3ef5476ca9c.ff"  "ffdf3c3ef54cd71b3.ff"  "ffdf3c3ef54f4e879.ff" 
##  [730] "ffdf3c3ef55a991b8.ff"  "ffdf3c3ef562ce991.ff"  "ffdf3c3ef5656c9d2.ff" 
##  [733] "ffdf3c3ef56d7486b.ff"  "ffdf3c3ef573fc17f.ff"  "ffdf3c3ef57fed162.ff" 
##  [736] "ffdf3c3ef581f5f6f.ff"  "ffdf3c3ef5832407e.ff"  "ffdf3c3ef595ebf51.ff" 
##  [739] "ffdf3c3ef5a0b1371.ff"  "ffdf3c3ef5c9e09e1.ff"  "ffdf3c3ef5cfaa16.ff"  
##  [742] "ffdf3c3ef5de9bede.ff"  "ffdf3c3ef5e43d1a.ff"   "ffdf3c3ef5ed9fd97.ff" 
##  [745] "ffdf3c3ef5f99e325.ff"  "ffdf3c3ef605c3cf4.ff"  "ffdf3c3ef61fc478d.ff" 
##  [748] "ffdf3c3ef641e8bc0.ff"  "ffdf3c3ef6478658b.ff"  "ffdf3c3ef649510d4.ff" 
##  [751] "ffdf3c3ef64a9a7ad.ff"  "ffdf3c3ef67094788.ff"  "ffdf3c3ef672b3fb4.ff" 
##  [754] "ffdf3c3ef67cdbefe.ff"  "ffdf3c3ef697af01.ff"   "ffdf3c3ef6f2d07e7.ff" 
##  [757] "ffdf3c3ef6fc296a4.ff"  "ffdf3c3ef71471665.ff"  "ffdf3c3ef71477fde.ff" 
##  [760] "ffdf3c3ef743bf5f1.ff"  "ffdf3c3ef7585f891.ff"  "ffdf3c3ef76768fce.ff" 
##  [763] "ffdf3c3ef780b285a.ff"  "ffdf3c3ef7b5f20f.ff"   "ffdf3c3ef7ba6ea43.ff" 
##  [766] "ffdf3c3ef7c8a8d30.ff"  "ffdf3c3ef7cae1244.ff"  "ffdf3c3ef7fb79eed.ff" 
##  [769] "ffdf3c3ef7fd61fbf.ff"  "ffdf3c3ef96cf994.ff"   "ffdf3c3ef9a9fe3f.ff"  
##  [772] "ffdf3c3ef9b8c347.ff"   "ffdf3c3efa12dfbd.ff"   "ffdf3c3efc1541f2.ff"  
##  [775] "ffdf3c3efd91d696.ff"   "ffdf3f8e91124f47a.ff"  "ffdf3f8e911bec726.ff" 
##  [778] "ffdf3f8e91326081d.ff"  "ffdf3f8e91372f458.ff"  "ffdf3f8e9154e1d3f.ff" 
##  [781] "ffdf3f8e9170e6179.ff"  "ffdf3f8e917ed3575.ff"  "ffdf3f8e917fe3429.ff" 
##  [784] "ffdf3f8e91c57d250.ff"  "ffdf3f8e91cb01fad.ff"  "ffdf3f8e91d3394d6.ff" 
##  [787] "ffdf3f8e91db789b5.ff"  "ffdf3f8e91e23612b.ff"  "ffdf3f8e92150e277.ff" 
##  [790] "ffdf3f8e921946eee.ff"  "ffdf3f8e92340d568.ff"  "ffdf3f8e92a34c671.ff" 
##  [793] "ffdf3f8e92b9cf6cc.ff"  "ffdf3f8e92baaa033.ff"  "ffdf3f8e92d3bf2ef.ff" 
##  [796] "ffdf3f8e930599cf3.ff"  "ffdf3f8e93221ca78.ff"  "ffdf3f8e9327fc53.ff"  
##  [799] "ffdf3f8e934ba770b.ff"  "ffdf3f8e934dd3908.ff"  "ffdf3f8e9362de3de.ff" 
##  [802] "ffdf3f8e9377a6f5d.ff"  "ffdf3f8e93c220b08.ff"  "ffdf3f8e93c4a540c.ff" 
##  [805] "ffdf3f8e93e2e801b.ff"  "ffdf3f8e93e3a35ff.ff"  "ffdf3f8e93f49cc9d.ff" 
##  [808] "ffdf3f8e9408834e4.ff"  "ffdf3f8e940a59e75.ff"  "ffdf3f8e9417608b1.ff" 
##  [811] "ffdf3f8e942a61556.ff"  "ffdf3f8e946c6a5fa.ff"  "ffdf3f8e947424ba1.ff" 
##  [814] "ffdf3f8e947f6aece.ff"  "ffdf3f8e9493a1978.ff"  "ffdf3f8e949eb4e9c.ff" 
##  [817] "ffdf3f8e94aa6af3e.ff"  "ffdf3f8e94cdc8c38.ff"  "ffdf3f8e94deeaaf7.ff" 
##  [820] "ffdf3f8e94e1b1953.ff"  "ffdf3f8e9510eea5d.ff"  "ffdf3f8e9554037f6.ff" 
##  [823] "ffdf3f8e955f9dfaf.ff"  "ffdf3f8e956994746.ff"  "ffdf3f8e958859d90.ff" 
##  [826] "ffdf3f8e9590112c7.ff"  "ffdf3f8e95904d39c.ff"  "ffdf3f8e95a29077a.ff" 
##  [829] "ffdf3f8e95a9c767b.ff"  "ffdf3f8e95af7109c.ff"  "ffdf3f8e95bcd0fac.ff" 
##  [832] "ffdf3f8e95c61f4b3.ff"  "ffdf3f8e95d385492.ff"  "ffdf3f8e95d7aae0b.ff" 
##  [835] "ffdf3f8e95f295865.ff"  "ffdf3f8e95f400dce.ff"  "ffdf3f8e96009a3ba.ff" 
##  [838] "ffdf3f8e961957ead.ff"  "ffdf3f8e9627a6dc2.ff"  "ffdf3f8e96320f841.ff" 
##  [841] "ffdf3f8e963952c34.ff"  "ffdf3f8e963fcfa89.ff"  "ffdf3f8e9651413fe.ff" 
##  [844] "ffdf3f8e966903bc5.ff"  "ffdf3f8e9682c417b.ff"  "ffdf3f8e96a0c572f.ff" 
##  [847] "ffdf3f8e96a8e5535.ff"  "ffdf3f8e96b0501be.ff"  "ffdf3f8e96b23970c.ff" 
##  [850] "ffdf3f8e96c429542.ff"  "ffdf3f8e96f1be8b4.ff"  "ffdf3f8e9706de4af.ff" 
##  [853] "ffdf3f8e9721a67c0.ff"  "ffdf3f8e9727bab50.ff"  "ffdf3f8e9729c4cbd.ff" 
##  [856] "ffdf3f8e974755e32.ff"  "ffdf3f8e9756e7e67.ff"  "ffdf3f8e976cddc36.ff" 
##  [859] "ffdf3f8e976d1b03.ff"   "ffdf3f8e976e90d59.ff"  "ffdf3f8e977f03c60.ff" 
##  [862] "ffdf3f8e979bb44d4.ff"  "ffdf3f8e97aa3c17b.ff"  "ffdf3f8e97ae8f9ac.ff" 
##  [865] "ffdf3f8e97b9420a9.ff"  "ffdf3f8e97f4ba7be.ff"  "ffdf3f8e97f7c0b20.ff" 
##  [868] "ffdf3f8e98516648.ff"   "ffdf3f8e988959bb.ff"   "ffdf3f8e996cb77f.ff"  
##  [871] "ffdf3f8e9c9b1b54.ff"   "ffdf3f8e9d99c2b9.ff"   "ffdf43245102f4f38.ff" 
##  [874] "ffdf4324510390bf8.ff"  "ffdf432451113874a.ff"  "ffdf4324511af0935.ff" 
##  [877] "ffdf43245122abda6.ff"  "ffdf43245126bdc18.ff"  "ffdf4324512bb9443.ff" 
##  [880] "ffdf4324515ab1dfb.ff"  "ffdf43245181977c4.ff"  "ffdf432451c4aed0f.ff" 
##  [883] "ffdf432451d298cb7.ff"  "ffdf432451d53aa2d.ff"  "ffdf432451eb62cdd.ff" 
##  [886] "ffdf432451ec7dcf9.ff"  "ffdf4324520274fed.ff"  "ffdf43245230c6634.ff" 
##  [889] "ffdf4324527089813.ff"  "ffdf43245281d6217.ff"  "ffdf4324529333865.ff" 
##  [892] "ffdf43245297b1418.ff"  "ffdf432452b079a5b.ff"  "ffdf432452bb83d70.ff" 
##  [895] "ffdf432452c12f40b.ff"  "ffdf432452cd6e3b0.ff"  "ffdf432452fdc336.ff"  
##  [898] "ffdf43245301033ac.ff"  "ffdf43245307eac50.ff"  "ffdf43245316f6c00.ff" 
##  [901] "ffdf4324532b4e538.ff"  "ffdf43245355c0e71.ff"  "ffdf432453578424c.ff" 
##  [904] "ffdf4324535d85afc.ff"  "ffdf432453802162f.ff"  "ffdf4324538655031.ff" 
##  [907] "ffdf432453b5f628a.ff"  "ffdf432453c194c1f.ff"  "ffdf432453cc1767e.ff" 
##  [910] "ffdf4324543f47b0c.ff"  "ffdf4324544d42b2c.ff"  "ffdf4324544e1a1f9.ff" 
##  [913] "ffdf43245455cde20.ff"  "ffdf432454563957.ff"   "ffdf4324546e78de.ff"  
##  [916] "ffdf43245472aa6bd.ff"  "ffdf4324547bff5b6.ff"  "ffdf43245485de11a.ff" 
##  [919] "ffdf4324548a66bed.ff"  "ffdf432454adc2dd8.ff"  "ffdf432454d25683a.ff" 
##  [922] "ffdf432454eec3424.ff"  "ffdf432454f220e2b.ff"  "ffdf432454f2f17bb.ff" 
##  [925] "ffdf432454fed3037.ff"  "ffdf432455004247d.ff"  "ffdf43245510a8086.ff" 
##  [928] "ffdf4324552cbec7a.ff"  "ffdf43245535aad02.ff"  "ffdf43245550d3724.ff" 
##  [931] "ffdf4324556bbd743.ff"  "ffdf43245589c0e68.ff"  "ffdf432455b2e56e2.ff" 
##  [934] "ffdf43245602a53d2.ff"  "ffdf4324562c35422.ff"  "ffdf43245639a55ae.ff" 
##  [937] "ffdf4324563e733a8.ff"  "ffdf43245652127ae.ff"  "ffdf432456553d9f6.ff" 
##  [940] "ffdf4324567848677.ff"  "ffdf432456844c83b.ff"  "ffdf432456a7493ed.ff" 
##  [943] "ffdf432456afea159.ff"  "ffdf432456b3eeb8d.ff"  "ffdf432456b5da61a.ff" 
##  [946] "ffdf432456bcb0a05.ff"  "ffdf432456dfd8442.ff"  "ffdf432456e25cab6.ff" 
##  [949] "ffdf432456e70bb6c.ff"  "ffdf432456ed300ad.ff"  "ffdf432456facb87c.ff" 
##  [952] "ffdf43245716bfde0.ff"  "ffdf4324574525d9a.ff"  "ffdf4324576e9c3a9.ff" 
##  [955] "ffdf43245780c79a1.ff"  "ffdf4324579a41a43.ff"  "ffdf4324579f633db.ff" 
##  [958] "ffdf432457af1e944.ff"  "ffdf432457c15cf37.ff"  "ffdf432457d383803.ff" 
##  [961] "ffdf432457e79123a.ff"  "ffdf432457f774d0b.ff"  "ffdf43245a96fe96.ff"  
##  [964] "ffdf43245a9be3da.ff"   "ffdf43245ab21f14.ff"   "ffdf43245c0c48b6.ff"  
##  [967] "ffdf43245ccd8e14.ff"   "ffdf43245e4a7ac7.ff"   "ffdf43245e523515.ff"  
##  [970] "ffdf47e96110d24c.ff"   "ffdf47e9613af174d.ff"  "ffdf47e9613f28f88.ff" 
##  [973] "ffdf47e9616ec9133.ff"  "ffdf47e9619b7e2eb.ff"  "ffdf47e961acc398f.ff" 
##  [976] "ffdf47e961ad62ec7.ff"  "ffdf47e961b581683.ff"  "ffdf47e961b64266e.ff" 
##  [979] "ffdf47e961c3dc2f5.ff"  "ffdf47e961c562778.ff"  "ffdf47e961cb9f643.ff" 
##  [982] "ffdf47e961dc599eb.ff"  "ffdf47e961df85207.ff"  "ffdf47e961e5e7e27.ff" 
##  [985] "ffdf47e9620176d0f.ff"  "ffdf47e9620e40a8b.ff"  "ffdf47e9621787693.ff" 
##  [988] "ffdf47e96234889e2.ff"  "ffdf47e9625becbff.ff"  "ffdf47e9625f0c89e.ff" 
##  [991] "ffdf47e9627afc0d4.ff"  "ffdf47e962adaa426.ff"  "ffdf47e962b597ff0.ff" 
##  [994] "ffdf47e962b6b4494.ff"  "ffdf47e962c2cb635.ff"  "ffdf47e962c7c16e0.ff" 
##  [997] "ffdf47e962e62170b.ff"  "ffdf47e9630bc749d.ff"  "ffdf47e9630e7e03b.ff" 
## [1000] "ffdf47e96320d9574.ff"  "ffdf47e9633382422.ff"  "ffdf47e96358a0b00.ff" 
## [1003] "ffdf47e9635a0c330.ff"  "ffdf47e9636e6907.ff"   "ffdf47e9638519b6.ff"  
## [1006] "ffdf47e96399e8eb4.ff"  "ffdf47e963a2578d3.ff"  "ffdf47e963a423e5.ff"  
## [1009] "ffdf47e963bdcf601.ff"  "ffdf47e963c369715.ff"  "ffdf47e963c4ea55a.ff" 
## [1012] "ffdf47e963d42b299.ff"  "ffdf47e963f8211b1.ff"  "ffdf47e9640a629d3.ff" 
## [1015] "ffdf47e96414c8d09.ff"  "ffdf47e9642a03fed.ff"  "ffdf47e96431c3e48.ff" 
## [1018] "ffdf47e9643d01451.ff"  "ffdf47e96446b8bea.ff"  "ffdf47e9644c9e6f1.ff" 
## [1021] "ffdf47e9648de312.ff"   "ffdf47e9649afa0a3.ff"  "ffdf47e964af07d21.ff" 
## [1024] "ffdf47e964d1e09aa.ff"  "ffdf47e964d8222b6.ff"  "ffdf47e964dce4597.ff" 
## [1027] "ffdf47e964e77cd8d.ff"  "ffdf47e964efdbdf2.ff"  "ffdf47e9652792ec1.ff" 
## [1030] "ffdf47e965611d7df.ff"  "ffdf47e965814e917.ff"  "ffdf47e9659c04ab9.ff" 
## [1033] "ffdf47e965abdfc9b.ff"  "ffdf47e965d550a7a.ff"  "ffdf47e965d937c2d.ff" 
## [1036] "ffdf47e965ece916.ff"   "ffdf47e96606b50e9.ff"  "ffdf47e96617bc708.ff" 
## [1039] "ffdf47e96621fb277.ff"  "ffdf47e96623ba1f.ff"   "ffdf47e96629d7733.ff" 
## [1042] "ffdf47e9663089f72.ff"  "ffdf47e9663376f0f.ff"  "ffdf47e96638cb6d5.ff" 
## [1045] "ffdf47e966571d862.ff"  "ffdf47e96673d55a7.ff"  "ffdf47e9668d47327.ff" 
## [1048] "ffdf47e966966e838.ff"  "ffdf47e9669dbf3fb.ff"  "ffdf47e966a14c272.ff" 
## [1051] "ffdf47e966da151df.ff"  "ffdf47e9670d79370.ff"  "ffdf47e96719ab78c.ff" 
## [1054] "ffdf47e9671b80770.ff"  "ffdf47e9671b9f6af.ff"  "ffdf47e96765c0fe8.ff" 
## [1057] "ffdf47e967679215f.ff"  "ffdf47e96771a7c78.ff"  "ffdf47e9677283080.ff" 
## [1060] "ffdf47e967e05e9e5.ff"  "ffdf47e967f3e8e5f.ff"  "ffdf47e96b9c8f15.ff"  
## [1063] "ffdf47e96c9eee6.ff"    "ffdf47e96df7dde9.ff"   "ffdf47e96e529095.ff"  
## [1066] "ffdf47e96f1c7f7b.ff"   "ffdf605a811b005ce.ff"  "ffdf605a81225f999.ff" 
## [1069] "ffdf605a8142141c5.ff"  "ffdf605a8148c058e.ff"  "ffdf605a815c64517.ff" 
## [1072] "ffdf605a8163923.ff"    "ffdf605a8184c3f50.ff"  "ffdf605a8188a83f0.ff" 
## [1075] "ffdf605a81962c61b.ff"  "ffdf605a8197bd841.ff"  "ffdf605a81aa3f7ec.ff" 
## [1078] "ffdf605a81b36c3be.ff"  "ffdf605a81b5e0b8f.ff"  "ffdf605a81bad10f3.ff" 
## [1081] "ffdf605a81d85e49.ff"   "ffdf605a81e07564d.ff"  "ffdf605a81e63d2e9.ff" 
## [1084] "ffdf605a81eee03d3.ff"  "ffdf605a81fc00839.ff"  "ffdf605a81fe1098a.ff" 
## [1087] "ffdf605a8210e8240.ff"  "ffdf605a82195b22b.ff"  "ffdf605a8222de35d.ff" 
## [1090] "ffdf605a8234357c8.ff"  "ffdf605a8250d2f72.ff"  "ffdf605a825f3d4a6.ff" 
## [1093] "ffdf605a826d04f3e.ff"  "ffdf605a828b48f00.ff"  "ffdf605a828cd7d54.ff" 
## [1096] "ffdf605a82a8b4c89.ff"  "ffdf605a82b128f74.ff"  "ffdf605a82c376194.ff" 
## [1099] "ffdf605a82d55396b.ff"  "ffdf605a82deecba9.ff"  "ffdf605a82e1f11c7.ff" 
## [1102] "ffdf605a831c81792.ff"  "ffdf605a83329302d.ff"  "ffdf605a8349f3d8b.ff" 
## [1105] "ffdf605a834fdeea2.ff"  "ffdf605a836d4c757.ff"  "ffdf605a83818ff8.ff"  
## [1108] "ffdf605a83c74d7e5.ff"  "ffdf605a83eca26e4.ff"  "ffdf605a840450b60.ff" 
## [1111] "ffdf605a840b39e41.ff"  "ffdf605a842b2920c.ff"  "ffdf605a8441ebbfc.ff" 
## [1114] "ffdf605a8455f25ce.ff"  "ffdf605a845f71a45.ff"  "ffdf605a846b158c8.ff" 
## [1117] "ffdf605a847bf9e09.ff"  "ffdf605a848588ce6.ff"  "ffdf605a84aa8157d.ff" 
## [1120] "ffdf605a84bb90c55.ff"  "ffdf605a84c60d8ca.ff"  "ffdf605a84cdd0308.ff" 
## [1123] "ffdf605a84cf82ffd.ff"  "ffdf605a84e4e2191.ff"  "ffdf605a84eeaeb97.ff" 
## [1126] "ffdf605a85489577c.ff"  "ffdf605a85548d511.ff"  "ffdf605a85592c44e.ff" 
## [1129] "ffdf605a8565d0d08.ff"  "ffdf605a856679f78.ff"  "ffdf605a85694cf90.ff" 
## [1132] "ffdf605a8581df3a8.ff"  "ffdf605a859de59b3.ff"  "ffdf605a85a4312e.ff"  
## [1135] "ffdf605a85fe3370b.ff"  "ffdf605a861d4ecf0.ff"  "ffdf605a86234f3f3.ff" 
## [1138] "ffdf605a86827f782.ff"  "ffdf605a86ca561bc.ff"  "ffdf605a86d5d5aa2.ff" 
## [1141] "ffdf605a86db71c84.ff"  "ffdf605a86ea269eb.ff"  "ffdf605a86ea6bd77.ff" 
## [1144] "ffdf605a86f7ed61c.ff"  "ffdf605a86fb0f30d.ff"  "ffdf605a87099b77f.ff" 
## [1147] "ffdf605a870cc6833.ff"  "ffdf605a870d2bf06.ff"  "ffdf605a872205eaf.ff" 
## [1150] "ffdf605a87223f9e3.ff"  "ffdf605a873eba5c3.ff"  "ffdf605a876164c5.ff"  
## [1153] "ffdf605a8786ffe43.ff"  "ffdf605a87a4aa908.ff"  "ffdf605a87b224d2b.ff" 
## [1156] "ffdf605a87b70475a.ff"  "ffdf605a87b8084b1.ff"  "ffdf605a87b8ce332.ff" 
## [1159] "ffdf605a87b96bdd4.ff"  "ffdf605a87d157b92.ff"  "ffdf605a87e0e4a1b.ff" 
## [1162] "ffdf605a89bdffbd.ff"   "ffdf605a8a46c3b3.ff"   "ffdf63f2312834a37.ff" 
## [1165] "ffdf63f23175710e.ff"   "ffdf63f2321bbe54c.ff"  "ffdf63f2322dd4313.ff" 
## [1168] "ffdf63f23230f9854.ff"  "ffdf63f23278ed2e1.ff"  "ffdf63f232a3e4a89.ff" 
## [1171] "ffdf63f232ea4ea5c.ff"  "ffdf63f232fa3c347.ff"  "ffdf63f23321e5017.ff" 
## [1174] "ffdf63f233581f382.ff"  "ffdf63f23405b9b78.ff"  "ffdf63f23418ed964.ff" 
## [1177] "ffdf63f234a1a8596.ff"  "ffdf63f234c78ff61.ff"  "ffdf63f234d7fb7cc.ff" 
## [1180] "ffdf63f234e83df4d.ff"  "ffdf63f234fe41867.ff"  "ffdf63f23501a4cf6.ff" 
## [1183] "ffdf63f23549443f8.ff"  "ffdf63f2354a2210b.ff"  "ffdf63f2356bd32bc.ff" 
## [1186] "ffdf63f235f257b0.ff"   "ffdf63f236088669.ff"   "ffdf63f2361b76da0.ff" 
## [1189] "ffdf63f2366601411.ff"  "ffdf63f236aa10759.ff"  "ffdf63f236d897df6.ff" 
## [1192] "ffdf63f236dfd9483.ff"  "ffdf63f236e544739.ff"  "ffdf63f23763395dd.ff" 
## [1195] "ffdf63f2378a3d4e2.ff"  "ffdf63f237c32bf78.ff"  "ffdf63f23833e6aa.ff"  
## [1198] "ffdf63f23939234e.ff"   "ffdf63f23bac3a50.ff"   "ffdf63f23ce71572.ff"  
## [1201] "ffdf63f23e05a53b.ff"   "ffdfe5f82a783569.ff"   "ffdfe5f82c945547.ff"  
## [1204] "ffdfe5f82cfb64c3.ff"   "ffdfe5f832f28999.ff"   "ffdfe5f8363498f6.ff"  
## [1207] "ffdfe5f836f1079a.ff"   "ffdfe5f837265a47.ff"   "ffdfe5f839fa9647.ff"  
## [1210] "ffdfe5f83af13972.ff"   "ffdfe5f83e6abf2e.ff"   "ffdfe5f85127b5f0.ff"  
## [1213] "ffdfe5f8522d8505.ff"   "ffdfe5f856972132.ff"   "ffdfe5f85c048607.ff"  
## [1216] "ffdfe5f8619c930.ff"    "ffdfe5f86e77171.ff"    "ffdfe5f871edeb45.ff"  
## [1219] "ffdfe5f872a02d0d.ff"   "ffdfe5f8a2a2ba4.ff"    "ffdfe69211cfa687.ff"  
## [1222] "ffdfe69220794987.ff"   "ffdfe69223779874.ff"   "ffdfe69224b0480f.ff"  
## [1225] "ffdfe6922ba4acbe.ff"   "ffdfe692314ae39a.ff"   "ffdfe69231bb4bdf.ff"  
## [1228] "ffdfe69231c31c8a.ff"   "ffdfe69248490454.ff"   "ffdfe6924c4097d5.ff"  
## [1231] "ffdfe69257bdf233.ff"   "ffdfe69257f52986.ff"   "ffdfe69277681f45.ff"  
## [1234] "ffdfe69277f65102.ff"   "ffdfe6927aca4eb8.ff"   "ffdfe6927f7236fa.ff"  
## [1237] "ffdfe69286c18c.ff"     "ffdfe692b9f9dd.ff"     "ffdfe692f0cce99.ff"   
## [1240] "ffdfe6e012cdb66d.ff"   "ffdfe6e0137cf1bb.ff"   "ffdfe6e014acc5cf.ff"  
## [1243] "ffdfe6e014bb58d8.ff"   "ffdfe6e0150d16f9.ff"   "ffdfe6e017c5d78a.ff"  
## [1246] "ffdfe6e019d3028d.ff"   "ffdfe6e01aacee5.ff"    "ffdfe6e01aad4ec.ff"   
## [1249] "ffdfe6e01e311ed3.ff"   "ffdfe6e0208ef38a.ff"   "ffdfe6e020949e00.ff"  
## [1252] "ffdfe6e020b17fa8.ff"   "ffdfe6e020c552fb.ff"   "ffdfe6e020f1cad7.ff"  
## [1255] "ffdfe6e0244f2b55.ff"   "ffdfe6e0275c1905.ff"   "ffdfe6e02aba12ef.ff"  
## [1258] "ffdfe6e02adebe21.ff"   "ffdfe6e02c024f9.ff"    "ffdfe6e02c2db784.ff"  
## [1261] "ffdfe6e030e8c43a.ff"   "ffdfe6e031f2ffc0.ff"   "ffdfe6e0368c74de.ff"  
## [1264] "ffdfe6e03810074c.ff"   "ffdfe6e038c224ec.ff"   "ffdfe6e03900f2a.ff"   
## [1267] "ffdfe6e03dc9b137.ff"   "ffdfe6e03e2e1f7e.ff"   "ffdfe6e042ee25c5.ff"  
## [1270] "ffdfe6e04364e558.ff"   "ffdfe6e04615a916.ff"   "ffdfe6e04798ae7a.ff"  
## [1273] "ffdfe6e04874afde.ff"   "ffdfe6e0491b192.ff"    "ffdfe6e049a5ec12.ff"  
## [1276] "ffdfe6e04ba1c2ec.ff"   "ffdfe6e04da7dcea.ff"   "ffdfe6e04fa4072c.ff"  
## [1279] "ffdfe6e057fe0ef8.ff"   "ffdfe6e05947e84.ff"    "ffdfe6e05aeea3a3.ff"  
## [1282] "ffdfe6e05dfb9abe.ff"   "ffdfe6e060091a53.ff"   "ffdfe6e0615b26fa.ff"  
## [1285] "ffdfe6e0623bfa26.ff"   "ffdfe6e064c63cfb.ff"   "ffdfe6e06533ff5c.ff"  
## [1288] "ffdfe6e0658583c8.ff"   "ffdfe6e066d39039.ff"   "ffdfe6e0673e634a.ff"  
## [1291] "ffdfe6e06888e8e9.ff"   "ffdfe6e069043abe.ff"   "ffdfe6e0693950f8.ff"  
## [1294] "ffdfe6e06c462b1f.ff"   "ffdfe6e06c4c8439.ff"   "ffdfe6e06d6b51c4.ff"  
## [1297] "ffdfe6e06fe235fb.ff"   "ffdfe6e06ffdd938.ff"   "ffdfe6e072656f02.ff"  
## [1300] "ffdfe6e072b937c1.ff"   "ffdfe6e073c0bbbc.ff"   "ffdfe6e074734e9c.ff"  
## [1303] "ffdfe6e074f51d3d.ff"   "ffdfe6e0759f7066.ff"   "ffdfe6e07680d2ce.ff"  
## [1306] "ffdfe6e078e6f04e.ff"   "ffdfe6e07920ea95.ff"   "ffdfe6e07a294419.ff"  
## [1309] "ffdfe6e07acaed8b.ff"   "ffdfe6e07d1122b5.ff"   "ffdfe6e07f6ecb6f.ff"  
## [1312] "ffdfe6e07f83b7bf.ff"   "ffdfe6e0852cbc8.ff"    "ffdfe6e0916f9b9.ff"   
## [1315] "ffdfe6e0e5f403b.ff"    "ffdfe6e0f6eab73.ff"    "ffdfe6e0f71e8dc.ff"   
## [1318] "ffe6e01990eebb.ff"     "ffe6e057919fc7.ff"     "ffe6e07a64a326.ff"
# investigate the structure of the object created in the R environment
summary(flights)
##                Length Class     Mode
## year           336776 ff_vector list
## month          336776 ff_vector list
## day            336776 ff_vector list
## dep_time       336776 ff_vector list
## sched_dep_time 336776 ff_vector list
## dep_delay      336776 ff_vector list
## arr_time       336776 ff_vector list
## sched_arr_time 336776 ff_vector list
## arr_delay      336776 ff_vector list
## carrier        336776 ff_vector list
## flight         336776 ff_vector list
## tailnum        336776 ff_vector list
## origin         336776 ff_vector list
## dest           336776 ff_vector list
## air_time       336776 ff_vector list
## distance       336776 ff_vector list
## hour           336776 ff_vector list
## minute         336776 ff_vector list
## time_hour      336776 ff_vector list

Memory mapping with bigmemory

Preparations

# SET UP ----------------

# load packages
library(bigmemory)
library(biganalytics)

Memory mapping with bigmemory

Import data, inspect change in RAM.

# import the data
flights <- read.big.matrix("../data/flights.csv",
                     type="integer",
                     header=TRUE,
                     backingfile="flights.bin",
                     descriptorfile="flights.desc")

Memory mapping with bigmemory

Inspect the imported data.

summary(flights)
##                          min           max          mean           NAs
## year             2013.000000   2013.000000   2013.000000      0.000000
## month               1.000000     12.000000      6.548510      0.000000
## day                 1.000000     31.000000     15.710787      0.000000
## dep_time            1.000000   2400.000000   1349.109947   8255.000000
## sched_dep_time    106.000000   2359.000000   1344.254840      0.000000
## dep_delay         -43.000000   1301.000000     12.639070   8255.000000
## arr_time            1.000000   2400.000000   1502.054999   8713.000000
## sched_arr_time      1.000000   2359.000000   1536.380220      0.000000
## arr_delay         -86.000000   1272.000000      6.895377   9430.000000
## carrier             9.000000      9.000000      9.000000 318316.000000
## flight              1.000000   8500.000000   1971.923620      0.000000
## tailnum                                                  336776.000000
## origin                                                   336776.000000
## dest                                                     336776.000000
## air_time           20.000000    695.000000    150.686460   9430.000000
## distance           17.000000   4983.000000   1039.912604      0.000000
## hour                1.000000     23.000000     13.180247      0.000000
## minute              0.000000     59.000000     26.230100      0.000000
## time_hour        2013.000000   2014.000000   2013.000261      0.000000

Memory mapping with bigmemory

Inspect the object loaded into the R environment.

flights
## An object of class "big.matrix"
## Slot "address":
## <pointer: 0x558ce76b0af0>

Memory mapping with bigmemory

  • backingfile: The cache for the imported file (holds the raw data on disk).
  • descriptorfile: Metadata describing the imported data set (also on disk).

Memory mapping with bigmemory

Understanding the role of backingfile and descriptorfile.

First, import a large data set without a backing-file:

# import data and check time needed  
system.time(
     flights1 <- read.big.matrix("../data/flights.csv",
                                 header = TRUE,
                                 sep = ",",
                                 type = "integer")
)
##    user  system elapsed 
##   1.178   0.040   1.218
# import data and check memory used
mem_change(
     flights1 <- read.big.matrix("../data/flights.csv",
                                 header = TRUE,
                                 sep = ",",
                                 type = "integer")
)
## 528 B
flights1 
## An object of class "big.matrix"
## Slot "address":
## <pointer: 0x558ce110f260>

Memory mapping with bigmemory

Understanding the role of backingfile and descriptorfile.

Second, import the same data set with a backing-file:

# import data and check time needed  
system.time(
     flights2 <- read.big.matrix("../data/flights.csv",
                                 header = TRUE,
                                 sep = ",",
                                 type = "integer",
                                 backingfile = "flights2.bin",
                                 descriptorfile = "flights2.desc"
                                 )
)
##    user  system elapsed 
##   1.190   0.035   1.226
# import data and check memory used
mem_change(
     flights2 <- read.big.matrix("../data/flights.csv",
                                 header = TRUE,
                                 sep = ",",
                                 type = "integer",
                                 backingfile = "flights2.bin",
                                 descriptorfile = "flights2.desc"
                                 )
)
## 528 B
flights2
## An object of class "big.matrix"
## Slot "address":
## <pointer: 0x558ce5a65230>

Memory mapping with bigmemory

Understanding the role of backingfile and descriptorfile.

Third, re-import the same data set with a backing-file.

# remove the loaded file
rm(flights2)

# 'load' it via the backing-file
system.time(flights2 <- attach.big.matrix("flights2.desc"))
##    user  system elapsed 
##   0.001   0.000   0.000
flights2
## An object of class "big.matrix"
## Slot "address":
## <pointer: 0x558ceb0ca870>

Cleaning and Transformation

Typical tasks (independent of data set size)

  • Normalize/standardize.
  • Code additional variables (indicators, strings to categorical, etc.).
  • Remove, add covariates.
  • Merge data sets.
  • Set data types.

Typical workflow

  1. Import raw data.
  2. Clean/transform.
  3. Store for analysis.
    • Write to file.
    • Write to database.

Bottlenecks

  • RAM:
    • Raw data does not fit into memory.
    • Transformations enlarge RAM allocation (copying).
  • Mass Storage: Reading/Writing
  • CPU: Parsing (data types)

Data Preparation with ff

Set up

The following examples are based on Walkowiak (2016), Chapter 3.

## SET UP ------------------------

#Set working directory to the data and airline_id files.
# setwd("materials/code_book/B05396_Ch03_Code")
system("mkdir ffdf")
options(fftempdir = "ffdf")

# load packages
library(ff)
library(ffbase)
library(pryr)

# fix vars
FLIGHTS_DATA <- "../code_book/B05396_Ch03_Code/flights_sep_oct15.txt"
AIRLINES_DATA <- "../code_book/B05396_Ch03_Code/airline_id.csv"

Data import

# DATA IMPORT ------------------

# 1. Upload flights_sep_oct15.txt and airline_id.csv files from flat files. 

system.time(flights.ff <- read.table.ffdf(file=FLIGHTS_DATA,
                                          sep=",",
                                          VERBOSE=TRUE,
                                          header=TRUE,
                                          next.rows=100000,
                                          colClasses=NA))
## read.table.ffdf 1..100000 (100000)  csv-read=0.547sec ffdf-write=0.07sec
## read.table.ffdf 100001..200000 (100000)  csv-read=0.584sec ffdf-write=0.061sec
## read.table.ffdf 200001..300000 (100000)  csv-read=0.579sec ffdf-write=0.062sec
## read.table.ffdf 300001..400000 (100000)  csv-read=0.582sec ffdf-write=0.062sec
## read.table.ffdf 400001..500000 (100000)  csv-read=0.585sec ffdf-write=0.057sec
## read.table.ffdf 500001..600000 (100000)  csv-read=0.576sec ffdf-write=0.047sec
## read.table.ffdf 600001..700000 (100000)  csv-read=0.572sec ffdf-write=0.042sec
## read.table.ffdf 700001..800000 (100000)  csv-read=0.548sec ffdf-write=0.051sec
## read.table.ffdf 800001..900000 (100000)  csv-read=0.587sec ffdf-write=0.046sec
## read.table.ffdf 900001..951111 (51111)  csv-read=0.305sec ffdf-write=0.052sec
##  csv-read=5.465sec  ffdf-write=0.55sec  TOTAL=6.015sec
##    user  system elapsed 
##   5.821   0.195   6.017
airlines.ff <- read.csv.ffdf(file= AIRLINES_DATA,
                             VERBOSE=TRUE,
                             header=TRUE,
                             next.rows=100000,
                             colClasses=NA)
## read.table.ffdf 1..1607 (1607)  csv-read=0.005sec ffdf-write=0.004sec
##  csv-read=0.005sec  ffdf-write=0.004sec  TOTAL=0.009sec
# check memory used
mem_used()
## 1,026,137,512 B

Comparison with read.table

##Using read.table()
system.time(flights.table <- read.table(FLIGHTS_DATA, 
                                        sep=",",
                                        header=TRUE))
##    user  system elapsed 
##   5.150   0.223   5.387
gc()
##             used   (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells   1396861   74.7    2150848  114.9   2150848  114.9
## Vcells 136560693 1041.9  213343342 1627.7 212430407 1620.8
system.time(airlines.table <- read.csv(AIRLINES_DATA,
                                       header = TRUE))
##    user  system elapsed 
##   0.002   0.000   0.002
# check memory used
mem_used()
## 1,170,732,976 B

Inspect imported files

# 2. Inspect the ffdf objects.
## For flights.ff object:
class(flights.ff)
## [1] "ffdf"
dim(flights.ff)
## [1] 951111     28
## For airlines.ff object:
class(airlines.ff)
## [1] "ffdf"
dim(airlines.ff)
## [1] 1607    2

Data cleaning and transformation

Goal: merge airline data to flights data

# step 1: 
## Rename "Code" variable from airlines.ff to "AIRLINE_ID" and "Description" into "AIRLINE_NM".
names(airlines.ff) <- c("AIRLINE_ID", "AIRLINE_NM")
names(airlines.ff)
## [1] "AIRLINE_ID" "AIRLINE_NM"
str(airlines.ff[1:20,])
## 'data.frame':    20 obs. of  2 variables:
##  $ AIRLINE_ID: int  19031 19032 19033 19034 19035 19036 19037 19038 19039 19040 ...
##  $ AIRLINE_NM: Factor w/ 1607 levels "40-Mile Air: Q5",..: 945 1025 503 721 64 725 1194 99 1395 276 ...

Data cleaning and transformation

Goal: merge airline data to flights data

# merge of ffdf objects
mem_change(flights.data.ff <- merge.ffdf(flights.ff, airlines.ff, by="AIRLINE_ID"))
## 780 kB
class(flights.data.ff)
## [1] "ffdf"
dim(flights.data.ff)
## [1] 951111     29
dimnames(flights.data.ff)
## [[1]]
## NULL
## 
## [[2]]
##  [1] "YEAR"              "MONTH"             "DAY_OF_MONTH"      "DAY_OF_WEEK"      
##  [5] "FL_DATE"           "UNIQUE_CARRIER"    "AIRLINE_ID"        "TAIL_NUM"         
##  [9] "FL_NUM"            "ORIGIN_AIRPORT_ID" "ORIGIN"            "ORIGIN_CITY_NAME" 
## [13] "ORIGIN_STATE_NM"   "ORIGIN_WAC"        "DEST_AIRPORT_ID"   "DEST"             
## [17] "DEST_CITY_NAME"    "DEST_STATE_NM"     "DEST_WAC"          "DEP_TIME"         
## [21] "DEP_DELAY"         "ARR_TIME"          "ARR_DELAY"         "CANCELLED"        
## [25] "CANCELLATION_CODE" "DIVERTED"          "AIR_TIME"          "DISTANCE"         
## [29] "AIRLINE_NM"

Inspect difference to in-memory operation

##For flights.table:
names(airlines.table) <- c("AIRLINE_ID", "AIRLINE_NM")
names(airlines.table)
## [1] "AIRLINE_ID" "AIRLINE_NM"
str(airlines.table[1:20,])
## 'data.frame':    20 obs. of  2 variables:
##  $ AIRLINE_ID: int  19031 19032 19033 19034 19035 19036 19037 19038 19039 19040 ...
##  $ AIRLINE_NM: chr  "Mackey International Inc.: MAC" "Munz Northern Airlines Inc.: XY" "Cochise Airlines Inc.: COC" "Golden Gate Airlines Inc.: GSA" ...
# check memory usage of merge in RAM 
mem_change(flights.data.table <- merge(flights.table,
                                       airlines.table,
                                       by="AIRLINE_ID"))
## 160 MB

Subsetting

mem_used()
## 1,331,359,336 B
# Subset the ffdf object flights.data.ff:
subs1.ff <- subset.ffdf(flights.data.ff, CANCELLED == 1, 
                        select = c(FL_DATE, AIRLINE_ID, 
                                   ORIGIN_CITY_NAME,
                                   ORIGIN_STATE_NM,
                                   DEST_CITY_NAME,
                                   DEST_STATE_NM,
                                   CANCELLATION_CODE))

dim(subs1.ff)
## [1] 4529    7
mem_used()
## 1,331,642,360 B

Save to ffdf-files

(For further processing with ff)

# Save a newly created ffdf object to a data file:

save.ffdf(subs1.ff, overwrite = TRUE) #7 files (one for each column) created in the ffdb directory

Load ffdf-files

# Loading previously saved ffdf files:
rm(subs1.ff)
gc()
##             used   (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells   1417352   75.7    4483604  239.5   3298928  176.2
## Vcells 156550988 1194.4  256092010 1953.9 212430407 1620.8
load.ffdf("ffdb")
str(subs1.ff)
## List of 3
##  $ virtual: 'data.frame':    7 obs. of  7 variables:
##  .. $ VirtualVmode     : chr  "integer" "integer" "integer" "integer" ...
##  .. $ AsIs             : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  .. $ VirtualIsMatrix  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  .. $ PhysicalIsMatrix : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  .. $ PhysicalElementNo: int  1 2 3 4 5 6 7
##  .. $ PhysicalFirstCol : int  1 1 1 1 1 1 1
##  .. $ PhysicalLastCol  : int  1 1 1 1 1 1 1
##  .. - attr(*, "Dim")= int [1:2] 4529 7
##  .. - attr(*, "Dimorder")= int [1:2] 1 2
##  $ physical: List of 7
##  .. $ FL_DATE          : list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$FL_DATE.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  ..  .. ..- attr(*, "Levels")= chr [1:61] "2015-09-01" "2015-09-02" "2015-09-03" "2015-09-04" ...
##  ..  .. ..- attr(*, "ramclass")= chr "factor"
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  .. $ AIRLINE_ID       : list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$AIRLINE_ID.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  .. $ ORIGIN_CITY_NAME : list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$ORIGIN_CITY_NAME.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  ..  .. ..- attr(*, "Levels")= chr [1:305] "Abilene, TX" "Akron, OH" "Albany, GA" "Albany, NY" ...
##  ..  .. ..- attr(*, "ramclass")= chr "factor"
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  .. $ ORIGIN_STATE_NM  : list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$ORIGIN_STATE_NM.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  ..  .. ..- attr(*, "Levels")= chr [1:52] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  ..  .. ..- attr(*, "ramclass")= chr "factor"
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  .. $ DEST_CITY_NAME   : list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$DEST_CITY_NAME.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  ..  .. ..- attr(*, "Levels")= chr [1:306] "Abilene, TX" "Akron, OH" "Albany, GA" "Albany, NY" ...
##  ..  .. ..- attr(*, "ramclass")= chr "factor"
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  .. $ DEST_STATE_NM    : list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$DEST_STATE_NM.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  ..  .. ..- attr(*, "Levels")= chr [1:52] "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  ..  .. ..- attr(*, "ramclass")= chr "factor"
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  .. $ CANCELLATION_CODE: list()
##  ..  ..- attr(*, "physical")=Class 'ff_pointer' <externalptr> 
##  ..  .. ..- attr(*, "vmode")= chr "integer"
##  ..  .. ..- attr(*, "maxlength")= int 4529
##  ..  .. ..- attr(*, "pattern")= chr "ffdf"
##  ..  .. ..- attr(*, "filename")= chr "/home/umatter/Dropbox/Teaching/HSG/BigData/BigData/materials/slides/ffdb/subs1.ff$CANCELLATION_CODE.ff"
##  ..  .. ..- attr(*, "pagesize")= int 65536
##  ..  .. ..- attr(*, "finalizer")= chr "close"
##  ..  .. ..- attr(*, "finonexit")= logi TRUE
##  ..  .. ..- attr(*, "readonly")= logi FALSE
##  ..  .. ..- attr(*, "caching")= chr "mmnoflush"
##  ..  ..- attr(*, "virtual")= list()
##  ..  .. ..- attr(*, "Length")= int 4529
##  ..  .. ..- attr(*, "Symmetric")= logi FALSE
##  ..  .. ..- attr(*, "Levels")= chr [1:4] "" "A" "B" "C"
##  ..  .. ..- attr(*, "ramclass")= chr "factor"
##  .. .. - attr(*, "class") =  chr [1:2] "ff_vector" "ff"
##  $ row.names:  NULL
## - attributes: List of 2
##  .. $ names: chr [1:2] "virtual" "physical"
##  .. $ class: chr "ffdf"
dim(subs1.ff)
## [1] 4529    7
dimnames(subs1.ff)
## [[1]]
## NULL
## 
## [[2]]
## [1] "FL_DATE"           "AIRLINE_ID"        "ORIGIN_CITY_NAME"  "ORIGIN_STATE_NM"  
## [5] "DEST_CITY_NAME"    "DEST_STATE_NM"     "CANCELLATION_CODE"

Export to CSV

#  Export subs1.ff into CSV and TXT files:
write.csv.ffdf(subs1.ff, "subset1.csv")

References

Walkowiak, Simkon. 2016. Big Data Analytics with R. Birmingham, UK: PACKT Publishing.

Wickham, Hadley. 2019. Advanced R. Second Edition. Boca Raton, FL: CRC Press.